Picture for Wenjie Tian

Wenjie Tian

Integrating Fine-Grained Audio-Visual Evidence for Robust Multimodal Emotion Reasoning

Add code
Jan 26, 2026
Viaarxiv icon

dLLM-ASR: A Faster Diffusion LLM-based Framework for Speech Recognition

Add code
Jan 25, 2026
Viaarxiv icon

VoiceSculptor: Your Voice, Designed By You

Add code
Jan 15, 2026
Viaarxiv icon

PodEval: A Multimodal Evaluation Framework for Podcast Audio Generation

Add code
Oct 01, 2025
Viaarxiv icon

Llasa+: Free Lunch for Accelerated and Streaming Llama-Based Speech Synthesis

Add code
Aug 08, 2025
Viaarxiv icon

Steering Language Model to Stable Speech Emotion Recognition via Contextual Perception and Chain of Thought

Add code
Feb 25, 2025
Figure 1 for Steering Language Model to Stable Speech Emotion Recognition via Contextual Perception and Chain of Thought
Figure 2 for Steering Language Model to Stable Speech Emotion Recognition via Contextual Perception and Chain of Thought
Figure 3 for Steering Language Model to Stable Speech Emotion Recognition via Contextual Perception and Chain of Thought
Figure 4 for Steering Language Model to Stable Speech Emotion Recognition via Contextual Perception and Chain of Thought
Viaarxiv icon

CosyAudio: Improving Audio Generation with Confidence Scores and Synthetic Captions

Add code
Jan 28, 2025
Viaarxiv icon

Autoregressive Speech Synthesis with Next-Distribution Prediction

Add code
Dec 22, 2024
Figure 1 for Autoregressive Speech Synthesis with Next-Distribution Prediction
Figure 2 for Autoregressive Speech Synthesis with Next-Distribution Prediction
Figure 3 for Autoregressive Speech Synthesis with Next-Distribution Prediction
Figure 4 for Autoregressive Speech Synthesis with Next-Distribution Prediction
Viaarxiv icon

YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls

Add code
Dec 12, 2024
Figure 1 for YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls
Figure 2 for YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls
Figure 3 for YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls
Figure 4 for YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls
Viaarxiv icon

Text-aware and Context-aware Expressive Audiobook Speech Synthesis

Add code
Jun 12, 2024
Viaarxiv icon